11 research outputs found
Sparse Learning for Variable Selection with Structures and Nonlinearities
In this thesis we discuss machine learning methods performing automated
variable selection for learning sparse predictive models. There are multiple
reasons for promoting sparsity in the predictive models. By relying on a
limited set of input variables the models naturally counteract the overfitting
problem ubiquitous in learning from finite sets of training points. Sparse
models are cheaper to use for predictions, they usually require lower
computational resources and by relying on smaller sets of inputs can possibly
reduce costs for data collection and storage. Sparse models can also contribute
to better understanding of the investigated phenomenons as they are easier to
interpret than full models.Comment: PhD thesi
Lifelong Generative Modeling
Lifelong learning is the problem of learning multiple consecutive tasks in a
sequential manner, where knowledge gained from previous tasks is retained and
used to aid future learning over the lifetime of the learner. It is essential
towards the development of intelligent machines that can adapt to their
surroundings. In this work we focus on a lifelong learning approach to
unsupervised generative modeling, where we continuously incorporate newly
observed distributions into a learned model. We do so through a student-teacher
Variational Autoencoder architecture which allows us to learn and preserve all
the distributions seen so far, without the need to retain the past data nor the
past models. Through the introduction of a novel cross-model regularizer,
inspired by a Bayesian update rule, the student model leverages the information
learned by the teacher, which acts as a probabilistic knowledge store. The
regularizer reduces the effect of catastrophic interference that appears when
we learn over sequences of distributions. We validate our model's performance
on sequential variants of MNIST, FashionMNIST, PermutedMNIST, SVHN and Celeb-A
and demonstrate that our model mitigates the effects of catastrophic
interference faced by neural networks in sequential learning scenarios.Comment: 32 page
Continual Classification Learning Using Generative Models
Continual learning is the ability to sequentially learn over time by
accommodating knowledge while retaining previously learned experiences. Neural
networks can learn multiple tasks when trained on them jointly, but cannot
maintain performance on previously learned tasks when tasks are presented one
at a time. This problem is called catastrophic forgetting. In this work, we
propose a classification model that learns continuously from sequentially
observed tasks, while preventing catastrophic forgetting. We build on the
lifelong generative capabilities of [10] and extend it to the classification
setting by deriving a new variational bound on the joint log likelihood, .Comment: 5 pages, 4 figures, under review in Continual learning Workshop NIPS
201
Vector-Quantized Graph Auto-Encoder
In this work, we addresses the problem of modeling distributions of graphs.
We introduce the Vector-Quantized Graph Auto-Encoder (VQ-GAE), a
permutation-equivariant discrete auto-encoder and designed to model the
distribution of graphs. By exploiting the permutation-equivariance of graph
neural networks (GNNs), our autoencoder circumvents the problem of the ordering
of the graph representation. We leverage the capability of GNNs to capture
local structures of graphs while employing vector-quantization to prevent the
mapping of discrete objects to a continuous latent space. Furthermore, the use
of autoregressive models enables us to capture the global structure of graphs
via the latent representation. We evaluate our model on standard datasets used
for graph generation and observe that it achieves excellent performance on some
of the most salient evaluation metrics compared to the state-of-the-art
Sparse learning for variable selection with structures and nonlinearities
In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of input variables the models naturally counteract the overfitting problem ubiquitous in learning from finite sets of training points. Sparse models are cheaper to use for predictions, they usually require lower computational resources and by relying on smaller sets of inputs can possibly reduce costs for data collection and storage. Sparse models can also contribute to better understanding of the investigated phenomenons as they are easier to interpret than full models
Graph annotation generative adversarial networks
We consider the problem of modelling high-dimensional distributions and generating new examples of data with complex relational feature structure coherent with a graph skeleton. The model we propose tackles the problem of generating the data features constrained by the specific graph structure of each data point by splitting the task into two phases. In the first it models the distribution of features associated with the nodes of the given graph, in the second it complements the edge features conditionally on the node features. We
follow the strategy of implicit distribution modelling via generative adversarial network (GAN) combined with permutation equivariant message passing architecture operating over the sets of nodes and edges. This enables generating the feature vectors of all the graph objects in one go (in 2 phases) as opposed to a much slower one-by-one generations of sequential models, prevents the need for expensive graph matching procedures usually needed for likelihood-based generative models, and uses efficiently the network capacity by being insensitive to the particular node ordering in the graph representation. To the best of our knowledge, this is the first method that models the feature distribution along the graph skeleton allowing for generations of annotated graphs with user specified structures. Our experiments demonstrate the ability of our model to learn complex structured distributions through quantitative evaluation over three annotated graph datasets
Functional learning of time-series models preserving Granger-causality structures
We develop a functional learning approach to modelling systems of time series which preserves the ability of standard linear time-series models (VARs) to uncover the Granger-causality links in between the series of the system while allowing for richer functional relationships. We propose a framework for learning multiple output-kernels associated with multiple input-kernels over a structured input space and outline an algorithm for simultaneous learning of the kernels with the model parameters with various forms of regularization including non-smooth sparsity inducing norms. We present results of synthetic experiments illustrating the benefits of the described approach
Structured nonlinear variable selection
We investigate structured sparsity methods for variable selection in regression problems where the target depends nonlinearly on the inputs. We focus on general nonlinear functions not limiting a priori the function space to additive models. We propose two new regularizers based on partial derivatives as nonlinear equivalents of group lasso and elastic net. We formulate the problem within the framework of learning in reproducing kernel Hilbert spaces and show how the variational problem can be reformulated into a more practical finite dimensional equivalent. We develop a new algorithm derived from the ADMM principles that relies solely on closed forms of the proximal operators. We explore the empirical properties of our new algorithm for Nonlinear Variable Selection based on Derivatives (NVSD) on a set of experiments and confirm favourable properties of our structured-sparsity models and the algorithm in terms of both prediction and variable selection accuracy
Learning Predictive Leading Indicators for Forecasting Time Series Systems with Unknown Clusters of Forecast Tasks
We present a new method for forecasting systems of multiple interrelated time series. The method learns the forecast models together with discovering leading indicators from within the system that serve as good predictors improving the forecast accuracy and a cluster structure of the predictive tasks around these. The method is based on the classical linear vector autoregressive model (VAR) and links the discovery of the leading indicators to inferring sparse graphs of Granger causality. We formulate a new constrained optimisation problem to promote the desired sparse structures across the models and the sharing of information amongst the learning tasks in a multi-task manner. We propose an algorithm for solving the problem and document on a battery of synthetic and real-data experiments the advantages of our new method over baseline VAR models as well as the state-of-the-art sparse VAR learning methods